Add guide on choosing entity_id and entity_uri for HERD references#2206
Add guide on choosing entity_id and entity_uri for HERD references#2206bendichter wants to merge 3 commits into
Conversation
Add a documentation page explaining how to populate the entity_id and entity_uri fields when adding HERD external resource references: - entity_id should be a CURIE (prefix:identifier) whose prefix is registered with bioregistry.io, which maps it to a canonical resolvable URL and avoids ambiguity between overlapping identifier schemes. - entity_uri should be the URL the CURIE resolves to (lookupable via https://bioregistry.io/<entity_id>). - Includes a table of commonly used registries (NCBITaxon, ROR, ORCID, UBERON, MBA, HBA, DANDI) with example entity_id/entity_uri pairs. - Documents the fallback for resources without per-term URLs (e.g. the D99 macaque atlas): put the resource URL in entity_uri and the term's atlas-specific ID in entity_id. Adds the page to the Resources toctree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #2206 +/- ##
=======================================
Coverage 95.29% 95.29%
=======================================
Files 30 30
Lines 3038 3038
Branches 450 450
=======================================
Hits 2895 2895
Misses 87 87
Partials 56 56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Per review feedback, map each general concept to the NWB fields it commonly annotates (e.g. species -> Subject.species, people -> NWBFile.experimenter, brain regions -> ElectrodeGroup.location / ImagingPlane.location), so users can connect a concept to where it appears in an NWB file. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @bendichter! This is very helpful! This seems to lay the foundation for a more general guide on external resources. I am wondering whether this may be more appropriate for the https://nwb-overview.readthedocs.io website to provide an entry point from there into the topic of external resources? I think this would probably be it's own top-level page, but I think it would also be useful to add a section to the conversion guide https://nwb-overview.readthedocs.io/en/latest/conversion_tutorial/user_guide.html# What do you think? |
|
yes, I agree it could go there. That way it can be a reference for MATLAB and Python users. |
I think this is a great start as is, so I think we can just move it to nwb-overview and then keep adding to it as we go in additional issues/PRs. |
The UBERON, MBA, and HBA rows all mapped to the same location fields. Replace the repeated list with a footnote reference so the field list (ElectrodeGroup.location, ImagingPlane.location, electrodes location column) is written once. list-table cannot span cells, so a shared footnote is the cleanest way to avoid the repetition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
closed in favor of nwb-overview |
Motivation
Follow-up to the HERD tutorial discussion in #2200 and the June 23 CN+LBNL sync. Users adding external resource references with
HERD.add_refneed clear guidance on what to put in theentity_idandentity_urifields, which is currently undocumented and was a recurring point of confusion (e.g. which of the manyNCBITaxon/taxonomy/NCBI_TAXONforms to use, and what to do for atlases without per-term URLs).This adds a narrative reference page (
docs/source/external_resources_entity_guide.rst, in the Resources toctree).Guidance
entity_idshould be a CURIE (prefix:identifier) whose prefix is registered with bioregistry.io. The Bioregistry maps each CURIE to a canonical, resolvable URL and disambiguates the many overlapping identifier schemes.entity_urishould be the URL the CURIE resolves to, which you can look up viahttps://bioregistry.io/<entity_id>.A table of commonly used registries with verified example
entity_id→entity_uripairs:entity_identity_uriNCBITaxonNCBITaxon:10090http://purl.obolibrary.org/obo/NCBITaxon_10090RORROR:013meh722https://ror.org/013meh722ORCIDORCID:0000-0002-1825-0097https://orcid.org/0000-0002-1825-0097UBERONUBERON:0001950http://purl.obolibrary.org/obo/UBERON_0001950MBAMBA:385https://purl.brain-bican.org/ontology/mbao/MBA_385HBAHBA:4005https://purl.brain-bican.org/ontology/hbao/HBA_4005DANDIDANDI:000015https://dandiarchive.org/dandiset/000015(Each resolved URL was verified against the Bioregistry API.)
Fallback for resources without per-term URLs (e.g. the macaque D99 atlas): put the resource's overall URL in
entity_uriand the term's atlas-specific ID inentity_id, so every reference still dereferences to something authoritative.Notes
entity_urivalues are inline literals (not hyperlinks), so the-Wlinkcheck CI does not probe them; only the two stable homepages (bioregistry.io, the W3C CURIE spec) are linked. Cross-references tohdmf.common.resources.HERD/add_refresolve via the existing hdmf intersphinx mapping.nwb-schemasubmodule is in a modified state (build fails at config withNo specification for 'BaseImage'), unrelated to this change. RST structure (table columns, title underlines) was validated separately.🤖 Generated with Claude Code